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Abstract 

Background: In the last 30 years, a number of DNA fingerprinting methods such as RFLP, RAPD, AFLP, SSR, DArT, have 
been extensively used in marker development for molecular plant breeding. However, it remains a daunting task to 
identify highly polymorphic and closely linked molecular markers for a target trait for molecular marker-assisted selection. 
The next-generation sequencing (NGS) technology is far more powerful than any existing generic DNA fingerprinting 
methods in generating DNA markers. In this study, we employed a grain legume crop Lupinus angustifolius (lupin) as a 
test case, and examined the utility of an NGS-based method of RAD (restriction-site associated DNA) sequencing as DNA 
fingerprinting for rapid, cost-effective marker development tagging a disease resistance gene for molecular breeding. 

Results: Twenty informative plants from a cross of RxS (disease resistant x susceptible) in lupin were subjected to RAD 
single-end sequencing by multiplex identifiers. The entire RAD sequencing products were resolved in two lanes of the 
16-lanes per run sequencing platform Solexa HiSeq2000. A total of 185 million raw reads, approximately 17 Gb of 
sequencing data, were collected. Sequence comparison among the 20 test plants discovered 8207 SNP markers. 
Filtration of DNA sequencing data with marker identification parameters resulted in the discovery of 38 molecular 
markers linked to the disease resistance gene Lanrl. Five randomly selected markers were converted into cost-effective, 
simple PGR-based markers. Linkage analysis using marker genotyping data and disease resistance phenotyping data on a 
F 8 population consisting of 186 individual plants confirmed that all these five markers were linked to the R gene. Two of 
these newly developed sequence-specific PCR markers, AnSeq3 and AnSeq4, flanked the target R gene at a genetic 
distance of 0.9 centiMorgan (cM), and are now replacing the markers previously developed by a traditional DNA 
fingerprinting method for marker-assisted selection in the Australian national lupin breeding program. 

Conclusions: We demonstrated that more than 30 molecular markers linked to a target gene of agronomic trait of 
interest can be identified from a small portion (1/8) of one sequencing run on HiSeq2000 by applying NGS based RAD 
sequencing in marker development. The markers developed by the strategy described in this study are all co-dominant 
SNP markers, which can readily be converted into high throughput multiplex format or low-cost, simple PCR-based 
markers desirable for large scale marker implementation in plant breeding programs. The high density and closely linked 
molecular markers associated with a target trait help to overcome a major bottleneck for implementation of molecular 
markers on a wide range of germplasm in breeding programs. We conclude that application of NGS based RAD 
sequencing as DNA fingerprinting is a very rapid and cost-effective strategy for marker development in molecular plant 
breeding. The strategy does not require any prior genome knowledge or molecular information for the species under 
investigation, and it is applicable to other plant species. 
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Background 

Plant breeding is a mission of continuously discovering 
and pyramiding desirable genes of agronomic or end-use 
interest into breeding lines to produce superior cultivars. 
Molecular markers linked to genes of interest can be 
developed and applied for marker-assisted selection 
(MAS) to increase the efficiency of genetic improvement 
[1-3]. Marker development for MAS in plant breeding 
usually requires that a cross be made between two par- 
ental plants which differ in genes or traits of interest to 
produce a segregating progeny population. The genomes 
of these segregating plants are then fingerprinted to 
identify markers linked to the genes of interest. In the 
last three decades, a number of generic DNA finger- 
printing methods, such as restriction fragment length 
polymorphism (RFLP) [4], random amplified poly- 
morphic DNA (RAPD) [5,6], simple sequence repeat 
(SSR) [7], Diversity Arrays Technology (DArT) [8], amp- 
lified fragment length polymorphism (AFLP) [9,10] and 
microsatellite -anchored fragment length polymorphism 
(MFLP) [11-14], have been used in marker development 
for molecular plant breeding. These methods are effect- 
ive, but are labour-intensive and time-consuming. At 
present, the development of markers tightly linked to 
genes of interest still remains a difficult task. 

To expedite marker development, Michelmore et al 
[15] described the "bulked segregant analysis" (BSA) 
method, in which a small number of informative segre- 
gating individual plants (usually 20) are bulked to form 
two pools differing only for the selection region before 
conducting DNA fingerprinting for identification of can- 
didate markers linked to the genes of interest. The iden- 
tified candidate markers are then tested on a large 
number of segregating individual plants to confirm the 
genetic linkage between the markers and the target 
genes before the markers are implemented in MAS. BSA 
has been widely used in marker development for mo- 
lecular plant breeding [16,17]. In our experience in mar- 
ker development using the DNA fingerprinting method 
MFLP, which is a method based on the combination of 
the AFLP concept with microsatellite motifs [18], we 
adapted the BSA principle of employing a small number 
of informative progeny plants, but we kept each individ- 
ual plant separate in DNA fingerprinting. This approach 
effectively eliminated the problem of detecting "false 
positive" candidate markers (DNA bands appearing as 
candidate markers in the bulk, but proven as otherwise 
when tested on individual plants separately) [12,13]. 
Using this protocol, we have developed a number of mo- 
lecular markers linked to various genes of interest ap- 
plicable to plant breeding [11-14,19-24]. 

The next- generation sequencing (NGS) technology 
provides a powerful tool for detecting large numbers of 
DNA markers within a short time-frame. Several marker 



development methods utilising NGS platforms to se- 
quence complexity reduced representations were 
reported, including reduced-representation libraries 
(RRLs) [25,26], complexity reduction of polymorphic 
sequences (CRoPS) [27], restriction-site associated DNA 
sequencing (RAD-seq) [28], sequence based poly- 
morphic marker technology (SBP) [29], low coverage 
multiplexed shotgun genotyping (MSG) [30], and geno- 
typing by sequencing (GBS) [31]. "Restriction-site asso- 
ciated DNA (RAD)" was originally described by Miller 
et al [32] based on microarray platform. Baird et al [33] 
adapted the RAD on the massively-parallel NGS plat- 
form to efficiently detect DNA polymorphisms without 
the requirement of any prior molecular knowledge for 
the species under investigation. RAD sequencing pro- 
duces two types of DNA markers: one type of markers is 
from DNA variations within the restriction sites which 
are dominant markers; the other is from sequence vari- 
ation adjacent to the restriction sites which are co- 
dominant markers [28]. RAD markers have been 
employed in genetic mapping on fungi [34], fish [33], 
insects [35], and more recently on plants [28,36,37]. 

Narrow-leafed lupin (Lupinus angustifolius L.) is a 
grain legume crop cultivated in Australia, Europe, Amer- 
ica and Africa. Anthracnose caused by the fungal patho- 
gen Colletotrichum lupini is the most devastating 
disease of lupin [38]. In Australia, a single dominant 
gene conferring resistance to anthracnose, designated as 
"Lanrl" , is extensively applied in the national lupin 
breeding program to combat the disease [12]. Two mo- 
lecular markers were established using traditional mar- 
ker development methods, which were linked to Lanrl 
gene at the genetic distance of 3.5 and 2.3 centiMorgan 
(cM), respectively [12,19]. The objectives of this research 
were to examine the utility of RAD sequencing, applied 
as DNA fingerprinting, for rapid marker development 
for MAS in plant breeding, and to develop molecular 
markers more closely linked to the disease resistance 
gene Lanrl for molecular breeding in lupin. 

Results 

Generating SNP markers by RAD sequencing 

The marker development procedures employed in this 
study are illustrated in Figure 1. During the RAD se- 
quencing stage, a total of 185 million raw reads, com- 
prising approximately 17 Gb of sequencing data, were 
produced by HiSeq2000 from the two RAD-sequencing 
libraries constructed by the multiplex identifiers (MID) 
strategy from the 20 plants. After a read grouping pro- 
cedure within individual plants, each plant had its tag 
reads for marker discovery. Tag reads from the same re- 
striction association site in the genomes of the two par- 
ents were compared. A total of 8207 single nucleotide 
polymorphisms were obtained across the 20 plants in 
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Plant 1 x Plant 2 
\ 

F 8 population and phenotyping 

\ 

Select 20 informative plants including the two parents 

i 

Construct two 100-bp RAD sequencing libraries, each 
library had 10 plants, each plant with an unique MID 

i 

Processing RAD sequencing in two lanes on HiSeq2000 

\ 

Cluster RAD-tags for each plant by sequence similarity 

i 

Compare two parents to filter monomorphic sequences 

i 

Compare 20 plants to get high confidence SNPs 

\ 

Search for SNP markers matching phenotypes on 20 plants to 
identify candidate markers linked to target gene 

\ 

Convert RAD-SNP markers into PCR-based markers by 
designing a pair of primers to flank each SNP 

♦ 

Test PCR-based markers on a large segregating 
population to confirm genetic linkage to the target gene 

Implementation of PCR markers in molecular plant 
breeding 

Figure 1 A flow diagram illustrating the marker development 
procedures in this study. The first stage was to make a cross to 
develop, then phenotype a genetic population. The second stage 
was to conduct NGS-based RAD sequencing on a small number (20) 
of plants representing the presence and absence of the gene of 
interest to generate large number of sequence reads, followed by 
bioinformatics analysis to identify SNP markers showing correlation 
between marker genotypes and plant phenotypes. The third stage 
was to convert SNP markers into simple PCR-based markers. Finally, 
the PCR-based markers were tested on a large segregating 
population to confirm the genetic linkage between the markers and 
the gene of interest before the markers were implemented in 
molecular plant breeding. 

V J 



the RAD sequencing. The average coverage depth of the 
nucleotides of the 8207 SNP markers was 15.4X. 

Identification of candidate RAD markers linked to the 
Lanrl gene 

After filtration on the 8207 SNP markers with the para- 
meters for candidate marker identification, 38 co- 
dominant RAD markers were obtained (Table 1). For 
each of these 38 SNP markers, the nine F 8 RIL plants 
with anthracnose resistance showed the polymorphic 
nucleotide consistent with that of the disease resistance 
parent Tanjil; while the nine F 8 RIL plants susceptible to 
anthracnose disease exhibited the marker allele of the 
polymorphic nucleotide corresponding to the susceptible 
parent Unicrop (Table 1). These 38 RAD markers were 



considered as candidate markers linked to the disease 
resistance gene Lanrl based on the principles of candi- 
date marker discovery described earlier [11-14,19-24]. 

The DNA sequences of the 38 RAD markers were pre- 
sented in Table 2. The length of the RAD reads were all 
93 base pairs when the first nucleotide "G" from the 
EcoRI restriction sites (S'-G/AATTC-S') was included. 
The majority of the RAD markers contained the SNP 
mutation sites in the middle of the RAD sequence reads 
(Table 2), which provided enough sequence length to de- 
sign primer pairs to flank the SNP mutation sites in 
marker conversion. 

Conversion of selected candidate RAD-SNP markers into 
PCR-based markers 

Each of the five randomly selected SNP markers was 
successfully converted into a sequence-specific, simple 
PCR-based marker with a pair of sequence-specific pri- 
mers flanking the SNP site (Table 3). These PCR-based 
SNP markers exhibited as co-dominant polymorphic 
bands on the SSCP gels (Figure 2). The five newly estab- 
lished PCR markers were designated as AnSeql, 
AnSeq2, AnSeq3, AnSeq4 and AnSeq5 (Table 3). 

Linkage confirmation between the established PCR 
markers and the disease resistance gene Lanrl 

Marker genotyping data were obtained for the five newly 
established PCR markers on 186 F 8 RILs from the cross 
Unicrop x Tanjil. Linkage analysis using the marker 
genotyping score data and the anthracnose disease phe- 
notyping data on the 186 F 8 RILs showed that all the five 
PCR-based markers developed in this study were linked 
to the disease resistance gene Lanrl (Figure 3). The 
linked markers reported in this study were on linkage 
group "NLL-11" of the lupin genetic map reported by 
Nelson et al. [39] as evidenced by the presence of the 
same R gene (Lanrl) and the previous developed marker 
"AntjM2". Three of the five markers, AnSeql, AnSeq3 
and AnSeq4, were closer to the R gene than the previ- 
ously developed markers AntjMl and AntjM2 [12,19]. 
Two of the newly developed markers, AnSeq3 and 
AnSeq4, were flanking the R gene at a genetic distance 
of 0.9 cM (Figure 3). 

Discussion 

The marker development strategy which we applied in 
this study consisted of four stages. Firstly, a cross was 
made between two parental plants to create a segregat- 
ing progeny population, followed by phenotyping of the 
gene of interest of the individual progeny plants. Sec- 
ondly, a small number of informative plants were sub- 
jected to DNA fingerprinting by NGS based RAD 
sequencing to identify candidate markers linked to the 
target gene. Thirdly, selected candidate markers were 
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Table 1 Identification of 38 candidate SNP markers linked to anthracnose disease resistance gene Lanrl in cultivar 
Tanjil of Lupinus angustifolius L by NGS based RAD sequencing 



Candidate markers Ten plants susceptible to anthracnose disease Ten plants resistant to anthracnose disease 







PS* 


F 8 S1 


F 8 S2 


F 8 S3 


F 8 S4 


F 8 S5 


F 8 S6 


F 8 S7 


F 8 S8 


F 8 S9 


PR* 


F 8 R1 


F 8 R2 


F 8 R3 


F 8 R4 


F 8 R5 


F 8 R6 


F 8 R7 


F 8 R8 


F 8 


Candidate marker 


1 


a** 


a 


a 


a 


a 


a 


a 


a 


- 


a 


b 


b 


b 


b 


b 


b 


b 


b 


b 


b 


Candidate marker 


2 


a 


a 


a 


a 


a 


a 


a 


a 


a 


a 


b 


b 


b 


b 


b 


b 


b 


b 


b 


b 


Candidate marker 


3 


a 


a 


a 


a 


a 


a 


a 


a 


a 


a 


b 


b 


b 


b 


b 


b 


- 


b 


b 


b 


Candidate marker 


4 


a 


a 


a 


a 


a 


a 


a 


a 


a 


a 


b 


b 


b 


b 


b 


b 


b 


b 


b 


b 


Candidate marker 


5 


a 


a 


a 


a 


a 


a 


a 


a 


a 


a 


b 


b 


b 


b 


b 


b 


b 


b 


b 


b 


Candidate marker 


6 


a 


a 


a 


a 


a 


a 


a 


a 


a 


a 


b 


b 


b 


b 


b 


b 


b 


b 


b 


b 


Candidate marker 


7 


a 


a 


a 


a 


a 


a 


a 


a 


a 


a 


b 


b 


b 


b 


b 


b 


b 


b 


b 


b 


Candidate marker 


8 


a 


a 


a 


a 


a 


a 


a 


a 


a 


a 


b 


b 


b 


b 


b 


b 


b 


b 


b 


b 


Candidate marker 


9 


a 


a 


a 


a 


a 


a 


a 


a 


a 


a 


b 


b 


b 


b 


- 


b 


b 


b 


b 


b 


Candidate marker 


10 


a 


a 


a 


a 


a 


a 


a 


a 


a 


a 


b 


b 


b 


b 


b 


b 


b 


b 


b 


b 


Candidate marker 


11 


a 


a 


a 


a 


a 


a 


a 


a 


a 


a 


b 


b 


b 


b 


b 


b 


b 


b 


b 


b 


Candidate marker 


12 


a 


a 


a 


a 


a 


a 


a 


a 


a 


a 


b 


b 


b 


b 


b 


b 


b 


b 


b 


b 


Candidate marker 


13 


a 


a 


a 


a 


- 


a 


a 


a 


a 


a 


b 


b 


b 


b 


b 


b 


b 


b 


b 


b 


Candidate marker 


14 


a 


a 


a 


a 


a 


a 


a 


a 


a 


a 


b 


b 


b 


b 


b 


b 


b 


b 


b 


b 


Candidate marker 


15 


a 


a 


a 


a 


a 


a 


a 


a 


a 


a 


b 


b 


b 


b 


b 


b 


- 


b 


b 


b 


Candidate marker 


16 


a 


a 


a 


a 


a 


a 


a 


a 


a 


a 


b 


b 


b 


b 


b 


b 


b 


b 


b 


b 


Candidate marker 


17 


a 


a 


a 


a 


a 


a 


a 


a 


a 


a 


b 


b 


b 


b 


b 


b 


b 


b 


b 


b 


Candidate marker 


18 


a 


a 


a 


a 


a 


a 


a 


a 


a 


a 


b 


b 


b 


b 


- 


b 


b 


b 


b 


b 


Candidate marker 


19 


a 


a 


a 


a 


a 


a 


a 


a 


a 


a 


b 


b 


b 


b 


b 


b 


b 


b 


b 


b 


Candidate marker 


20 


a 


a 


a 


a 


a 


a 


a 


a 


a 


a 


b 


b 


b 


b 


b 


b 


b 


b 


b 


b 


Candidate marker 


21 


a 


a 


a 


a 


a 


a 


a 


a 


a 


a 


b 


b 


b 


b 


b 


b 


b 


b 


b 


b 


Candidate marker 


22 


a 


a 


a 


a 


a 


a 


a 


a 


a 


a 


b 


b 


b 


b 


b 


b 


b 


b 


b 


b 


Candidate marker 


23 


a 


a 


a 


a 


a 


a 


a 


a 


a 


a 


b 


b 


b 


b 


b 


b 


b 


b 


b 


b 


Candidate marker 


24 


a 


a 


a 


a 


a 


a 


a 


a 


a 


a 


b 


b 


b 


b 


b 


b 


b 


b 


b 


b 


Candidate marker 


25 


a 


a 


a 


a 


- 


a 


a 


a 


a 


a 


b 


b 


b 


b 


b 


b 


b 


b 


b 


b 


Candidate marker 


26 


a 


a 


a 


- 


a 


a 


a 


a 


a 


a 


b 


b 


b 


b 


b 


b 


b 


b 


b 


b 


Candidate marker 


27 


a 


a 


a 


a 


a 


a 


a 


a 


a 


a 


b 


b 


b 


b 


b 


b 


- 


b 


b 


b 


Candidate marker 


28 


a 


a 


a 


a 


a 


a 


a 


a 


a 


a 


b 


b 


b 


b 


b 


b 


b 


b 


b 


b 


Candidate marker 


29 


a 


a 


a 


a 


a 


a 


a 


a 


a 


a 


b 


b 


b 


b 


b 


b 


b 


b 


b 


b 


Candidate marker 


30 


a 


a 


a 


a 


a 


a 


a 




a 


a 


b 


b 


b 


b 


b 


b 


b 


b 


b 


b 


Candidate marker 


31 


a 


a 


a 


a 


a 


a 


a 


a 


a 


a 


b 


b 


b 


b 


b 


b 


b 


b 


b 


b 


Candidate marker 


32 


a 


a 


a 


a 


a 


a 


a 


a 


a 


a 


b 


b 


b 


b 


b 


b 


b 


b 


b 


b 


Candidate marker 


33 


a 


a 


a 


a 


a 


a 


a 


a 


a 


a 


b 


b 


b 


b 


b 


b 


b 


b 


b 


b 


Candidate marker 


34 


a 


a 


a 


a 


a 


a 


a 


a 


a 


a 


b 


b 


b 


b 


b 


b 


b 


b 


b 


b 


Candidate marker 


35 


a 


a 


a 


a 


a 


a 


a 


a 


a 


a 


b 


b 


b 


b 


b 


b 


b 


b 


b 




Candidate marker 


36 


a 


a 


a 


a 


a 


a 


a 


a 


a 


a 


b 


b 


b 


b 


b 


b 


b 


b 


b 


b 


Candidate marker 


37 


a 


a 


a 


a 


a 


a 


a 


a 


a 


a 


b 


b 


b 


b 


b 


b 


b 


b 


b 


b 


Candidate marker 


38 


a 


a 


a 


a 


a 


a 


a 


a 


a 


a 


b 


b 


b 


b 


b 


b 


b 


b 


b 


b 



;R9 



*PS = Parental cultivar Unicrop which is susceptible to anthracnose disease; PR = parental cultivar Tanjil which is resistant to anthracnose. 

**The nucleotides of the SNP markers corresponding to the disease susceptible parent Unicrop were recorded as "a"; the nucleotides of the SNP markers 

corresponding to the disease resistant parent Tanjil were recorded as "b"; missing data were recorded as 
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Table 2 RAD sequence reads of the 38 SNP candidate markers linked to anthracnose disease resistance gene Lanrl in 
cultivar Tanjil of Lupinus angustifolius discovered by RAD sequencing on NGS platform Solexa HiSeq2000 


Candidate 
markers* 


Coverage 
depth** 


DNA sequences (5'-3')*** 


Candidate 
marker 1 


7.0/6.4 


AATOAGATOAAGCGTGAAATAmATACGTAGGAGAAAATGAGAGAAGGGAGGTCTGTGAGAGAAAGAGTA[G/A] 
AAAGAAAAAAATAAAAAA 


Candidate 
marker 2 


5.7/6.7 


A A^C A AC AC ACCGCTC ATCTC A [A/C] 

CTACmCAAAATCAACCGTCTATGGATOTCTCCACTAAGATAATATATATOTAATAAAAAATGAA 


Candidate 
marker 3 


7.1/7.5 


AATOCACAAATOAAAAACCCGACCGCI I I I I I CATGAAATGCCAATGAAAATGHT/C] 
TGTOGTACTAATACTAATOATOAOTCTATAAG 


Candidate 
marker 4 


11.9/13.9 


AATCCACCAGGATOCACAACACTAATCTCAAOTGGIU IGI IGI I I I ICATAATOGCATCTACACCAATO[C/G] 
ATATCAACACTGCm 


Candidate 
marker 5 


7.2/7.7 


AATTCTTCTTCAGAAACAAGGAGCCAATC[G/C] 

GATOCAAATGCACTGAAGGAAGCTCAAGCAGAAAGCAAAATCTAAAGAGATOATCTGATA 


Candidate 
marker 6 


9.0/8.8 


AATOTOCAATGAACTOTA^CA^GCACTACTAAATCATAmGCAATATATATGTGTOTATOTGCAAACTGAATCm 
[A/G]TAT 


Candidate 
marker 7 


13.7/13.3 


AATOATCTGTACTGAmCmCATATCAAATAGCAGCAGTGGCAATCCTAAAAATAGAATGACCTCT[C^H 
GATGTGTGTGCATOTGTGTAT 


Candidate 
marker 8 


8.7/9.1 


AATOmAAACI I I I G I I CU I I A 1 1 IGAAGI ICILI IGU I I I ICTAAATCAAGTAAATOGGAGTQA/C] 
TAACAAAGmACCTOGAGCA 


Candidate 
marker 9 


9.1/8.8 


aatoagagacaggactcatggc™togtaatogaaca^[c^h 
ggaaatagcaatgaaatcaaagaaatotagataaccaaaaaaacact 


Candidate 
marker 10 


15.4/15.8 


AATO™AAACAAAmATACAAGmCTC^ATOATOATGAGAATAAGTCAAAATOAAAATGGAAGGTAACCC[A^1 
TATCAAAGCGT 


Candidate 
marker 1 1 


4.2/10.9 


aatotatatc™gtgato[c/a] 

gtgtaactotactactagaggtaggtgtgggatggcaamgactagtgaaatgaagtaaagtgactgc 


Candidate 
marker 12 


6.8/8.5 


AATCGAACTCATATATGAATACGTGGGCATCmATCAATOAGmGTCAm^A^I 
^AAmCTOAAAAG^GAGAGCAOT 


Candidate 
marker 13 


10.9/11.4 


AATOTGATGTGAAACAACGTGAAAAGAAAGGAGAAAAATCTGTCTOTGAACAAGAAATGGACA[G/A] 
ATATCAAAGCTCAGCCAGGAGCAm 


Candidate 
marker 14 


7.42/9.24 


AATOTCTCACAAGCTAATCGACATOATCTGTATGCmGCnr/A] 
ACAGTAGACTCTAGGACmCTCAAGTOAGTCACTCTATOCTOGCT 


Candidate 
marker 15 


12.8/14.2 


AATOTCATAATA^ATAGATCTCAmAAGAGmAAATAG™G[G/C] 
ATAAAGI 1 1 1 1 lACACATOTOTGAmGTATOGTATGTAT 


Candidate 
marker 16 


12.0/11.6 


AATCCCCTCAAAmACACTGmCTCGmGGGTOAAGAGCCCmGCTOTOCmGAGmAAACG/C] 
CTCCAAACmAAATAGAG^ 


Candidate 
marker 17 


14.9/15.6 


aatocagaggatacacatgacacactacaaca™gtacccg[a/g] 
caatgcctcaaaactgcggtctaatatgaaaaaatcgatgtcmg^ 


Candidate 
marker 18 


13.3/12.5 


aatcacactcaatcatgtotgcagc™aact[a/g] 
aaaaaacaataggacc^gctctoataaaamctgamaaaaaatgtacaag 


Candidate 
marker 19 


18.9/19.2 


AATTCATTCAAGGGTCTrGTCAATCAATTGA[A/C] 

AAAGATATATGATGAGTOCAAGCACTOTGCTACTAAGTCGTOOTGAAAACTGGGA 


Candidate 
marker 20 


12.9/13.6 


AATOTAGI 1 1 IGILI IGGILUAI IGI 1 IGU IGAI 1 1 1 1 CAATTCATTACTAAACTATT[C/T] 
TGACAGTOCTGCATACTAmGCCTOAA 


Candidate 
marker 21 


16.2/19.2 


AATOTGAAGCAAGTGTATACAmAAGTOTAGAAATAGAAAGGATACACTCACGG[G/A] 
ATGAGATAGCCAAGATAAACTATACATGGAATAT 


Candidate 
marker 22 


28.9/26.1 


AATCTAG IIIUIIIAIUlGIIUIII LLLAGAAGA 1 A 1 1 AL 1 1 G 1 (_ 1 1 1 AA 1 1 1 1 L 1 1 1 1 GGGTGGGA[A/G] 
TGGGAGTGAGGGGAATOAA 


Candidate 
marker 23 


18.7/19.8 


aai iu 1 1 igi igataacctcaaacaagatggcctagtg™atcatog™gaaca[c/g] 

TGAAATOAI 1 1 1 IGI 1 1 1 1 AAGACAACATATA 


Candidate 
marker 24 


20.2/19.8 


AATOHT/C] 

taataggtgtagtaggatatataataagaatac™aa™c™aaaaagtacatagatagataaatatcactatogacactct 


Candidate 
marker 25 


14.4/16.6 


aatotccgtctctccccctoacctocggagcaaaatccctcaataggtcccaagtoacgaatca™tcca[c^g 
cgccgaaatcctaa^ 


Candidate 
marker 26 


7.9/7.1 


aatotcaag™aatatgatgtggctaacagggtoactotaggtctcgagg^[g/a] 
ctgatgctgaaagatctotcatactgaatoatc 



Yang et al. BMC Genomics 2012, 13:318 
http://www.biomedcentral.eom/1 471 -21 64/1 3/318 



Page 6 of 1 1 



Table 2 RAD sequence reads of the 38 SNP candidate markers linked to anthracnose disease resistance gene Lanrl in 
cultivar Tanjil of Lupinus angustifolius discovered by RAD sequencing on NGS platform Solexa HiSeq2000 (Continued) 



Candidate 
marker 27 

Candidate 
marker 28 

Candidate 
marker 29 

Candidate 
marker 30 

Candidate 
marker 31 

Candidate 
marker 32 

Candidate 
marker 33 

Candidate 
marker 34 

Candidate 
marker 35 

Candidate 
marker 36 

Candidate 
marker 37 

Candidate 
marker 38 



25.1/27.5 

21.4/20.2 

14.3/16.0 

8.9/1 1 .0 

20.5/21.1 

18.4/17.0 

31.7/31.2 

21.9/17.4 

27.2/26.1 

26.5/28.8 

25.8/24.8 

32.2/27.6 



AATCCAAATCmAGCTGAAAAGTCAATOACAACTATCAACA^ATCAGTAGAAGTGtC^l 
ATAACAGACTAGATATOGTATOATAAT 



aattcataactacmgtaatoatcaagc^ctoctgatcatcatotc^[c^n 
tctctgatctctatgtoagtatcaagaaataa 

aatoaagctgctatcaacm[a/g] 

aagctctctccccacatgctaactoatcaaatggctcatatatgcccatc^mcccaatgactagt 

aatoaacaacaaaaaatcatcgtaacgccagrr/c] 
aatcctcatocaacaactataatggccgcaaccataamaaaataccga^gm 

aatoatatocacacagamgtoaactgtoaamgc™catgtcctgaatcaaaaagaagagaaaagmagacac[g^1 
atgcggtca 

aatogaaatggagagcaatgtcacmcataaatgggataaacaaaamcgmctogtgcaacagtogaatgccnr/g] 
gtaataacaagc 

aatotggtgctgcgacagaagtgtotgcaca/g] 
aacta™gtcatcctc™aggtaaaaccta™tgctg™caaaagtoagctocc 

aatca™ccotgaca[a/c] 

cctacatgaatogtaaaaataagmagccaatotaacatggaacctgtagcatataaaaccaatgtota 



aatoaagcaaatogaccatgtgaaatctggatactg^gtcccatcmgacaacaatagcctctggnr/g] 
gctctcaacacmgtotc 

a a^ctgctagtg a agctgg [a/g] 

g^mcctgcacctgcaca^agactagtgccacatgaacctgmggcaatoccatoccac^ 

aato™gatac™caaggaacaaaataatoggtatatggtataagaccaacaccaactcaacatocacg[g/a] 
ctacactaacagtga^ 

aatoagtggaatamcatgtoacaaacacatogaccatagcgagaaagtgcacctctcrr/a] 
tatctatcatacctgaatggtctaag 



*Candidate marker numbers were consistent to the candidate makers listed in Table 1. 

**The first number was the average coverage depth (x) of the nucleotide from the 10 plants susceptible to anthracnose disease; the second number was the 
average coverage depth of the nucleotide from the 10 plants resistant to anthracnose. 

***The first nucleotide of each SNP in bracket for each RAD sequence read was from parental plant Unicrop; the second nucleotide of the SNP in bracket for each 
RAD sequence read was from parental plant Tanjil. 



converted into cost-effective, simple PCR-based markers. 
Fourthly, the converted markers were tested on a large 
number of individual plants of a segregating population 
to confirm the genetic linkage between the markers and 
the gene of interest before the markers were confidently 
implemented into a molecular breeding program. In 
traditional DNA fingerprinting methods such as RFLP, 
RAPD, AFLP and MFLP, the DNA fingerprints are visua- 
lized as DNA bands on the gels. By comparison, the "fin- 
gerprints" in RAD sequencing are presented as DNA 
sequence reads. SNP markers developed from RAD se- 
quencing are suitable for high throughput multiplex im- 
plementation in molecular plant breeding on modern 
SNP genotyping platforms. 

The most striking advantage in application of NGS 
based RAD sequencing as DNA fingerprinting in marker 
development for molecular plant breeding is the extraor- 
dinarily high efficiency. The massive power of the NGS 
technology for rapid and large scale marker discovery 
laid the foundation for the super-fast development of 
markers linked to the target gene Lanrl demonstrated in 
this study. In the RAD sequencing, we obtained 8207 
SNP markers across the 20 test plants. This number of 
markers obtained from a small portion (1/8) of one 



sequencing run is equivalent to months of investigation 
with traditional DNA fingerprinting methods. The lupin 
genome is approximately 1540 cM [40]. The 8207 SNP 
markers provided an average coverage of about 5.3 mar- 
kers for each cM in the genome. In theory, approxi- 
mately 32 of these SNP markers would be distributed on 
the chromosome at either side of the Lanrl gene within 
the genetic distance of 3 cM, or 53 markers at either side 
of the R gene within the genetic distance of 5 cM. 
Therefore, it was of no surprise that 38 markers were 
discovered linked to the target Lanrl gene in this study. 
The large number of molecular markers associated with 
a target gene should provide breeders with a broad suite 
of options to choose the markers to suit a wide range of 
breeding populations to support molecular plant breed- 
ing programs [13,41,42]. 

A further major advantage of using NGS technology in 
marker development is the ease in conversion of candi- 
date markers into cost-effective, simple PCR-based mar- 
kers. In MAS, molecular markers must be cost- 
effectively amenable to a large number of samples [43]. 
In traditional DNA fingerprinting such as RAPD, AFLP 
and MFLP, DNA markers recovered from the gels must 
go through a tedious process of DNA fragment isolation, 
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Table 3 Sequence-specific PCR markers linked to the anthracnose disease resistance Lanrl of Lupinus angustifolius 
developed in this study 



Marker 


Origin 


Primer pair 


Primer sequences (5' to 3') 


AnSeql 


Candidate marker 3* 


AnSeql F 


AATOCACAAATOAAAAAC 






AnSeql R 


GAAGTCAATOATOGTATOGTAC 


AnSeq2 


Candidate marker 5 


AnSeq2F 


OTOTCAGAAACAAGGAG 






AnSeq2R 


CAGATOATCTCmAGA^G 


AnSeq3 


Candidate marker 9 


AnSeq3F 


GAATOAGAGACAGGACTC 






AnSeq3R 


AGIGI 1 1 1 1 1 IGGTOTCTAG 


AnSeq4 


Candidate marker 13 


AnSeq4F 


GAATOTGATGTGAAACAAC 






AnSeq4R 


CTCCTGGCTGAGCmG 


AnSeq5 


Candidate marker 38 


AnSeq5F 


GAATCAGTGGAATAmCAT 






AnSeq5R 


CTOGAACCATOAGGTATG 



*Candidate marker numbers were consistent to the candidate makers listed in Table 1 and Table 2. 



PCR amplification, cloning and sequencing to determine 
the DNA sequences of the marker fragments to enable 
the design of sequence-specific primers [6,11,18]. Some- 
times marker conversion may still remain problematic 
even after the marker bands are sequenced, particularly 
for dominant markers, and for markers resulting from 
DNA variations from the restriction sites targeted by the 
restriction enzymes employed in DNA fingerprinting, in 
these cases further DNA sequence extension after se- 
quencing is required [19,20]. By contrast, when NGS is 
used as DNA fingerprinting, the DNA sequences of can- 
didate markers are known, and ready for primer design. 
With the parameters used for candidate marker identifi- 
cation employed in this study, all selected markers are 
co-dominant markers. The length of RAD sequencing 
reads in our study was 92 base pairs. The majority of 
SNP mutation sites were in the middle of the sequencing 
reads, which provided enough sequence length at both 
ends of the sequence reads to design a pair of sequence- 



specific primers to flank the SNP sites to convert the 
SNP markers into PCR-based co-dominant markers. 

Anthracnose disease resistance in cultivar Tanjil of L. 
angustifolius is conditioned by a single dominant gene 
Lanrl which is highly heritable [12,19]. In this study, 
three out of five established sequence-specific PCR mar- 
kers, AnSeql, AnSeq3 and AnSeq4, were closer to the 
target gene Lanrl than the other two markers AntjMl 
and AntjM2 previously developed with conventional 
DNA fingerprinting when 12 plants were used [12,19]. 
Two of the newly developed markers, AnSeq3 and 
AnSeq4, are co-dominant markers flanking the Lanrl 
gene in 0.9 cM. The accuracy to selection F 2 plants pos- 
sessing the Lanrl gene using either marker AnSeq3 or 
AnSeq4 in marker-assisted selection in lupin breeding 
will be approximately 99%; and the accuracy would be 
99.9% if both markers are applied in MAS. Genotyping 
based selection using these markers is capable of distin- 
guishing the homozygous resistant plants (RR) from 




Figure 2 Testing of sequence-specific PCR-based molecular markers "AnSeql", "AnSeq2" and "AnSeq5" on 26 F 8 recombinant inbred 
lines from a cross of Unicrop (susceptible to anthracnose disease) x Tanjil (resistant) of lupin [Lupinus angustifolius L). "AnSeql R ", 
"AnSeq2 R " and"AnSeq5 R " indicate the marker allele bands linked to disease resistance gene Lanrl. "AnSeq1 s ", "AnSeq2 s " and "AnSeq5 s " indicate 
the marker allele bands associated with disease susceptibility allele. Disease phenotypes of the RILs are presented as "S" (susceptible) or "R" 
(resistant). A marker band with a vertical arrow indicates that a genetic recombination occurred between the R gene and marker locus on the 
chromosome in that particular plant for that particular marker. All other un-marked marker bands showed the correct match between the marker 
genotypes and the disease resistance phenotypes on these testing plants. 
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Figure 3 Genetic linkage of sequence-specific PCR-based 
molecular markers and the disease resistance gene Lanrl of 
Lupinus angustifolius. Five PCR based markers, AnSeql, AnSeq2, 
AnSeq3, AnSeq4 and AnSeq5, were developed in this study. The 
other two markers, AntjMI and AntjM2, which were used as 
controls, were established previously using traditional marker 
development methods [12,19]. Genetic distance in the linkage was 
expressed as centiMorgans. The linkage map was initially 
constructed using MapManager QTX [47] and finalized by RECORD 
program [48]. These linked markers were on linkage group "NLL-1 1" 
of the lupin genetic map reported by Nelson et ol. [39] as evidenced 
by the presence of the same R gene {Lanrl) and the previous 
developed marker "AntjM2". 



heterozygous resistant plants (Rr) among the F 2 progeny 
plants resulting from RxS crosses. This leads to the se- 
lection and fixation of the desirable gene at the early 
generation in the breeding cycle [11-14]. By comparison, 
plants selected based on traditional disease phenotyping 
would contain both the Rr genotype and the RR geno- 
type, where further disease resistance selection in the 
following breeding cycle is still required due to segrega- 
tion from plants with Rr genotype. Therefore, genotyp- 
ing based marker-assisted selection is much more cost- 
effective than traditional phenotyping based selection. 
The two markers AnSeq3 and AnSeq4 are now replacing 
the previously developed markers AntjMI and AntjM2 
for marker-assisted selection in the Australian national 
lupin breeding program. 

Conclusions 

We have demonstrated that the NGS -based RAD se- 
quencing technology can be cohesively integrated into 
the marker development protocol for molecular plant 
breeding. The sequencing reads generated from the 
RAD sequencing have the same function and effects as 
the DNA fingerprints produced by traditional DNA fin- 
gerprinting methods for marker development in molecu- 
lar plant breeding. The application of NGS -based 
technology in marker development provides several sig- 
nificant advantages over tradition methods. Firstly, mar- 
ker development with NGS is very rapid. The entire 
RAD sequencing work can be completed in days. Sec- 
ondly, dozens of molecular markers linked to a target 
gene can be discovered in one sequencing run, which is 
in sharp contrast to traditional DNA fingerprinting 
methods in which only one or a few markers can be 
found after working for months. The large number of 
linked markers not only provides the luxury for the mo- 
lecular geneticist to choose the marker most closely 
linked to the gene, but also offers plant breeders the op- 
tion to select markers applicable to a wide range of 
crosses in their breeding programs. Thirdly, DNA mar- 
kers obtained by our marker development strategy are 
all co-dominant, which can readily be converted into 
cost-effective, simple PCR-based markers desirable for 
high throughput implementation on modern SNP geno- 
typing platforms for marker-assisted selection in mo- 
lecular plant breeding. 

The marker development strategy applied in this study 
does not require any prior genome knowledge or genetic 
mapping information. This will potentiate its utilization 
across a wide range of plant species. 

Methods 

Marker development protocol 

The marker development protocol used in this study 
was illustrated in Figure 1. The strategy contained four 
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stages. Firstly, a cross was made to create a genetic 
population segregating for the gene of interest; and indi- 
vidual plants in the population were phenotyped. Sec- 
ondly, NGS -based RAD sequencing was conducted on a 
small number of representative plants to identify SNP 
markers showing correlation between marker genotypes 
and plant phenotypes. Thirdly, candidate SNP markers 
were converted into simple PCR markers. Finally, the 
PCR markers were tested on a large segregating popula- 
tion to confirm the genetic linkage between the markers 
and the gene of interest; and the markers closely linked 
to the gene were selected and applied in molecular plant 
breeding (Figure 1). 

Plant materials 

A single lupin plant of cultivar Tanjil (resistant to an- 
thracnose disease) was used as the pollen donor, and 
was crossed with a single plant of cultivar Unicrop (sus- 
ceptible to anthracnose). F 2 seeds from a single F x plant 
were harvested and advanced to F 8 recombinant inbred 
lines (RILs) by single seed descent with no bias. The par- 
ental lines and the F 8 population (consisting of 186 RILs) 
were tested against anthracnose disease in both glass- 
house and field trials. Disease resistance or susceptibility 
in each line was assessed with the method described by 
Thomas et al [38]. Genetic analysis for anthracnose re- 
sistance in the F 2 and in the F 8 populations from this 
cross showed that the disease resistance was controlled 
by a single dominant R gene, which was designated as 
Lanrl [12,19]. 

Search for candidate markers linked to anthracnose 
resistance by RAD sequencing 

The workflow for the marker development strategy in 
this study is illustrated in Figure 1. Selection of test 
plants for RAD analysis and the identification of candi- 
date RAD markers linked to the target gene Lanrl fol- 
lowed the same principle as in marker development by 
MFLP [11-14,19-24]. Twenty plants were used in RAD 
sequencing. Ten of the plants were resistant to the dis- 
ease, including the parent plant Tanjil and nine ran- 
domly selected resistant RILs. The other 10 plants were 
susceptible, consisting of the susceptible parent Unicrop 
and nine randomly selected susceptible RILs (Table 1). 
RAD sequencing and analyses on each of the 20 plants 
were treated separately. 

The protocols of RAD sequencing were the same as 
described by Chutimanitsakun et al [36], except we used 
the restriction enzyme EcoRI (recognition site: 5'-G 
/AATTC-3'). EcoRI is a more frequent cutter than the 
restriction enzyme Sbfl used by Chutimanitsakun et al 
[36], resulting in detection of more markers in RAD se- 
quencing. Two 100 bp single-end sequencing libraries 
were constructed using the eight-nucleotide multiplex 



identifiers [33]. Each library contained 10 plants. Each 
plant was assigned to a unique MID barcode. The RAD 
products from the 20 plants together with four controls 
were processed in two lanes on the NGS platform 
HiSeq2000. Sequencing data were segregated by the in- 
dividual specific MID. Reads from each plant were clus- 
tered into tag reads by sequence similarity (allowing two 
mismatches at most between any two reads within each 
tag reads cluster) and clusters with <2 or >100 reads 
were discarded [44]. 

Tag reads from the two parental plants were compared 
and filtered to remove monomorphic DNA sequences, 
leaving only the tag reads with SNP polymorphisms. The 
remaining sequences were then compared among all the 
20 plants to select highly confident SNPs (Figure 3). All 
scripts used above were custom written. The algorithm of 
the scripts was the same as described by Catchen et al 
[44]. The scripts used in this study are available to any 
researchers upon request. If a SNP marker showed the 
polymorphic nucleotide genotypes correlating with the 
disease resistance and susceptibility phenotypes on the 20 
test plants, it was regarded as a candidate marker linked 
to the disease resistance gene based on the same principle 
as in candidate marker development with MFLP finger- 
printing [11-14,20-24]. Any markers with more than one 
missing data point on the 20 plants were discarded. These 
selection criteria effectively eliminated all dominant mar- 
kers, because dominant markers would appear on one al- 
lele, but would be absent on the other allele (sequencing 
reads missing either on all the resistant plants or on all 
susceptible plants) of the same locus [13,20,22]. 

Conversion of candidate SNP markers into sequence- 
specific PCR markers 

As a large number of candidate markers were identified 
linked to the LanRl gene in this study (Table 1), we ran- 
domly selected five candidate SNP markers for conver- 
sion into simple PCR-based markers. A pair of 
sequence-specific primers was designed near each end of 
the RAD reads for each selected candidate marker. Since 
all the RAD reads started from the EcoRI restriction sites 
(5'-G/AATTC-3') (Table 2), the first nucleotide "G" from 
the EcoRI recognition sites was included in the forward 
primers if necessary, which was the case for markers 
AnSeq3, AnSeq4 and AnSeq5 (Table 3). The annealing 
temperature of primers was designed at approximately 
54°C calculated using the nearest-neighbour model 
(https://www.sigmaaldrich.com). DNA fragments of con- 
verted markers were amplified inlO ul PCR consisting of 
1.5 ul template DNA (approximately 100 ng), 0.5 unit of 
Taq polymerase (Fisher Biotec, Perth), 5 pmol each of 
two sequence-specific primers, 67 mM Tris-HCl 
(pH8.8), 2 mM MgCl 2 , 16.6 mM (NH 4 ) 2 S0 4 , 0.45% Tri- 
ton X-100, 4 ug gelatin, and 0.2 mM dNTPs. PCR was 
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performed on a thermocycler (Hybaid DNA Express) 
with each cycle comprising 30s at 94°C, 30s at the 
annealing temperature (see below), and 1 min at 72°C 
The annealing temperature of the first cycle was 60°C, 
and decreased 0.7°C in each subsequent cycle until the 
temperature reached 54°C The final 25 cycles used an 
annealing temperature of 54°C PCR products were 
resolved as single-stranded conformation polymorph- 
isms (SSCP) [45] on 6% acrylamide gel using a Sequi- 
Gen GT sequencing cell (Bio-Rad). Detailed methods of 
running the SSCP gels were described elsewhere [46]. 

Linkage confirmation between the established markers 
and the Lanrl gene 

The newly established five sequence-specific PCR mar- 
kers (Table 3) were tested on a segregating population 
consisting of 186 F 8 RILs derived from the cross Unicrop 
x Tanjil The marker geno typing score data and the an- 
thracnose disease phenotyping data were merged and 
analysed by the software program MapManager QTX 
[47] to determine the genetic linkage between the mar- 
kers and the Lanrl gene. The genetic distance was cal- 
culated using the Kosambi function. The linkage map 
was initially constructed using MapManager QTX and 
finalized by RECORD program [48]. The two markers, 
"AntjMl" and "AntjM2", which were previously devel- 
oped using traditional DNA fingerprinting methods 
[12,19], were included in the linkage analysis as controls. 
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